Proteogenomic mapping as a complementary method to perform genome annotation.

نویسندگان

  • Jacob D Jaffe
  • Howard C Berg
  • George M Church
چکیده

The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics.

High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a dai...

متن کامل

N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana *

Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering ...

متن کامل

A proteogenomic survey of the Medicago truncatula genome.

Peptide sequencing by computational assignment of tandem mass spectra to a database of putative protein sequences provides an independent approach to confirming or refuting protein predictions based on large-scale DNA and RNA sequencing efforts. This use of mass spectrometrically-derived sequence data for testing and refining predicted gene models has been termed proteogenomics. We report herei...

متن کامل

Proteomic analysis and genome annotation of Pichia pastoris, a recombinant protein expression host.

Pichia pastoris is a widely used eukaryotic host for production of recombinant proteins. We performed a proteogenomic analysis using high resolution Fourier transform MS to characterize the proteome of the GS115 strain. Our analysis resulted in identification of 46,889 unique peptides mapping to 3914 unique protein groups, which corresponds to ∼ 80% of the predicted genes. In addition, our prot...

متن کامل

An integrated analysis and database system for full-length cDNA.

Annotation and database system of full-length cDNA sequences was developed. As the components of the system, ORF annotation system, functional annotation system based on database search results, mapping annotation system, and integrated retrieval and display system were developed. In the ORF annotation system integrated analyses using conventional tools are performed and useful retrieval interf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proteomics

دوره 4 1  شماره 

صفحات  -

تاریخ انتشار 2004